D3 Python基礎回顧

17th鐵人賽

max1112

2025-09-17 20:02:44

106 瀏覽

分享至

Python 資料科學三大基礎套件

NumPy – 高效的數值與矩陣運算

NumPy（Numerical Python）是Python專門用於科學計算數值計算的一個套件，他提供了一个類似列表的多維陣列(ndarray)能夠以比 Python list更快、更省記憶體的方式處理大量數學函數。他主要用於:

1.向量化運算：支援陣列之間的數學運算，免去 for 迴圈。
2.提供大量數學函數（矩陣運算、統計、隨機數生成）。

常見在:

import numpy as np

arr = np.array([1, 2, 3])       # 建立一維陣列
mat = np.array([[1, 2], [3, 4]])# 建立二維矩陣

np.zeros((2,2))     # 全零矩陣
np.ones((3,3))      # 全一矩陣
np.random.rand(2,2) # 亂數矩陣

# 陣列運算
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
a + b   # [5 7 9]
a * b   # [ 4 10 18]

Pandas – 結構化資料處理

Pandas建立在NumPy的基礎上，補足了Python在資料分析與建模上的不足。它提供了一維的 Series 與二維的 DataFrame 這兩種核心結構，讓使用者能以更直觀的方式處理結構化資料。相較於 NumPy 偏重於數值與矩陣運算，Pandas 更強調資料的可讀性與靈活性。透過Pandas，可以輕鬆讀取與操作CSV、Excel或SQ 資料庫等常見格式，並具備重塑、切割、聚合與篩選子集合的能力，結合了NumPy的運算效能、試算表的便利性以及關聯式資料庫的查詢特徵，是資料前處理與分析中不可或缺的重要工具。他主要用於:

1.提供快速的資料讀取與清理工具。
2.進行篩選、統計、群組運算。

常見在:

import pandas as pd

# 建立 DataFrame
data = {'Name': ['Tom', 'Alice'], 'Age': [25, 30]}
df = pd.DataFrame(data)

df.head()       # 前五筆
df.info()       # 資料摘要
df.describe()   # 基本統計

# 讀取/寫入檔案
df = pd.read_csv('data.csv')
df.to_excel('output.xlsx', index=False)

# 篩選
df[df['Age'] > 26]  # 年齡大於 26 的資料

Matplotlib – 基礎資料視覺化

Matplotlib是Python最基礎的繪圖套件，可以繪製折線圖、長條圖、散點圖、直方圖。

import matplotlib.pyplot as plt

x = [1, 2, 3]
y = [10, 20, 15]

plt.plot(x, y, label="Trend")
plt.title("Sample Line Plot")
plt.xlabel("X")
plt.ylabel("Y")
plt.legend()
plt.show()